Text(0, 0.5, 'Density')
Text(0, 0.5, 'Sale Price per Square Feet')
Text(0, 0.5, 'Sale Price per Square Feet')
<matplotlib.axes._subplots.AxesSubplot at 0x7fbee06ba2d0>
<matplotlib.axes._subplots.AxesSubplot at 0x7fc1608bb0d0>
/Users/danielchoy/opt/anaconda3/lib/python3.7/site-packages/numpy/core/function_base.py:153: RuntimeWarning: invalid value encountered in multiply y *= step /Users/danielchoy/opt/anaconda3/lib/python3.7/site-packages/numpy/core/function_base.py:163: RuntimeWarning: invalid value encountered in add y += start /Users/danielchoy/opt/anaconda3/lib/python3.7/site-packages/numpy/linalg/linalg.py:1965: RuntimeWarning: invalid value encountered in greater large = s > cutoff
<matplotlib.axes._subplots.AxesSubplot at 0x7fbeb991e650>
(1, 62)
Text(0, 0.5, 'Sale Price')
<matplotlib.axes._subplots.AxesSubplot at 0x7fbee05eb050>
Text(0, 0.5, 'Sale Price')
<matplotlib.axes._subplots.AxesSubplot at 0x7fbef034e250>
Text(0, 0.5, 'Sale Price')
<matplotlib.axes._subplots.AxesSubplot at 0x7fbf1128ad10>
/Users/danielchoy/opt/anaconda3/lib/python3.7/site-packages/pandas/core/series.py:679: RuntimeWarning: divide by zero encountered in log result = getattr(ufunc, method)(*inputs, **kwargs)
<matplotlib.axes._subplots.AxesSubplot at 0x7fbf05063390>
<matplotlib.axes._subplots.AxesSubplot at 0x7fbed5736ad0>
20 1-STORY 1946 & NEWER ALL STYLES
30 1-STORY 1945 & OLDER
40 1-STORY W/FINISHED ATTIC ALL AGES
45 1-1/2 STORY - UNFINISHED ALL AGES
50 1-1/2 STORY FINISHED ALL AGES
60 2-STORY 1946 & NEWER
70 2-STORY 1945 & OLDER
75 2-1/2 STORY ALL AGES
80 SPLIT OR MULTI-LEVEL
85 SPLIT FOYER
90 DUPLEX - ALL STYLES AND AGES
120 1-STORY PUD (Planned Unit Development) - 1946 & NEWER
150 1-1/2 STORY PUD - ALL AGES
160 2-STORY PUD - 1946 & NEWER
180 PUD - MULTILEVEL - INCL SPLIT LEV/FOYER
190 2 FAMILY CONVERSION - ALL STYLES AND AGES
Typ Typical Functionality
Min1 Minor Deductions 1
Min2 Minor Deductions 2
Mod Moderate Deductions
Maj1 Major Deductions 1
Maj2 Major Deductions 2
Sev Severely Damaged
Sal Salvage only
OLS Regression Results
==============================================================================
Dep. Variable: SalePrice R-squared: 0.917
Model: OLS Adj. R-squared: 0.913
Method: Least Squares F-statistic: 279.5
Date: Mon, 30 Nov 2020 Prob (F-statistic): 0.00
Time: 11:58:04 Log-Likelihood: 1762.8
No. Observations: 2243 AIC: -3354.
Df Residuals: 2157 BIC: -2862.
Df Model: 85
Covariance Type: nonrobust
=========================================================================================
coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------------
const 8.2923 0.104 79.768 0.000 8.088 8.496
Distance -0.0149 0.003 -4.367 0.000 -0.022 -0.008
Alley -0.0089 0.012 -0.776 0.438 -0.032 0.014
OverallQual 0.0668 0.003 20.231 0.000 0.060 0.073
OverallCond 0.0395 0.003 14.028 0.000 0.034 0.045
TotRmsAbvGrd 0.0314 0.003 12.191 0.000 0.026 0.036
Fireplaces 0.0455 0.008 5.865 0.000 0.030 0.061
GarageArea 0.0002 1.85e-05 12.121 0.000 0.000 0.000
MoSold -0.0006 0.001 -0.688 0.491 -0.002 0.001
MasVnrArea2 0.0121 0.006 2.037 0.042 0.000 0.024
total_LivArea 0.2857 0.013 22.525 0.000 0.261 0.311
num_bathroom 0.0199 0.005 3.846 0.000 0.010 0.030
BldgAge -0.0012 0.000 -4.977 0.000 -0.002 -0.001
Remodeled -0.0108 0.006 -1.804 0.071 -0.023 0.001
IsPUD 0.0143 0.052 0.277 0.782 -0.087 0.116
LotIsReg -0.0128 0.006 -2.314 0.021 -0.024 -0.002
HillORDepr 0.0504 0.011 4.548 0.000 0.029 0.072
PosFeat 0.0194 0.017 1.174 0.240 -0.013 0.052
BsmtQual_num 0.0151 0.003 5.176 0.000 0.009 0.021
KitchenQual_num 0.0151 0.003 5.465 0.000 0.010 0.021
FireplaceQu_num 0.0020 0.001 1.404 0.161 -0.001 0.005
GarageQual_num 0.0019 0.006 0.326 0.744 -0.009 0.013
BsmtCond_num -0.0009 0.004 -0.263 0.793 -0.008 0.006
GarageCond_num 0.0063 0.006 1.013 0.311 -0.006 0.018
HeatingQC_num 0.0065 0.002 3.884 0.000 0.003 0.010
TotalPorchSF 0.0035 0.001 2.524 0.012 0.001 0.006
HasFence -0.0086 0.006 -1.332 0.183 -0.021 0.004
MSZoning_FV 0.2530 0.037 6.924 0.000 0.181 0.325
MSZoning_I (all) 0.2445 0.124 1.970 0.049 0.001 0.488
MSZoning_RH 0.1519 0.045 3.401 0.001 0.064 0.240
MSZoning_RL 0.1978 0.033 5.966 0.000 0.133 0.263
MSZoning_RM 0.1382 0.033 4.194 0.000 0.074 0.203
BldgType_2fmCon -0.0533 0.019 -2.822 0.005 -0.090 -0.016
BldgType_Duplex -0.1048 0.017 -6.217 0.000 -0.138 -0.072
BldgType_Twnhs -0.1277 0.053 -2.393 0.017 -0.232 -0.023
BldgType_TwnhsE -0.0787 0.052 -1.514 0.130 -0.181 0.023
HouseStyle_1.5Unf 0.0225 0.028 0.806 0.421 -0.032 0.077
HouseStyle_1Story 0.0184 0.010 1.849 0.065 -0.001 0.038
HouseStyle_2.5Fin -0.0717 0.049 -1.473 0.141 -0.167 0.024
HouseStyle_2.5Unf 0.0048 0.029 0.167 0.868 -0.052 0.061
HouseStyle_2Story -0.0349 0.010 -3.466 0.001 -0.055 -0.015
HouseStyle_SFoyer -0.0386 0.020 -1.974 0.048 -0.077 -0.000
HouseStyle_SLvl -0.0544 0.015 -3.542 0.000 -0.084 -0.024
Foundation_CBlock 0.0117 0.011 1.087 0.277 -0.009 0.033
Foundation_PConc 0.0309 0.012 2.519 0.012 0.007 0.055
Foundation_Slab 0.0067 0.030 0.227 0.820 -0.051 0.065
Foundation_Stone 0.0419 0.042 0.994 0.320 -0.041 0.124
Foundation_Wood 0.0632 0.059 1.078 0.281 -0.052 0.178
BsmtExposure_Gd 0.0407 0.011 3.762 0.000 0.019 0.062
BsmtExposure_Mn -0.0253 0.011 -2.281 0.023 -0.047 -0.004
BsmtExposure_No -0.0213 0.008 -2.612 0.009 -0.037 -0.005
CentralAir_Y 0.0332 0.013 2.493 0.013 0.007 0.059
Electrical_FuseF -0.0363 0.022 -1.622 0.105 -0.080 0.008
Electrical_FuseP -0.0007 0.046 -0.015 0.988 -0.092 0.090
Electrical_SBrkr -0.0172 0.010 -1.649 0.099 -0.038 0.003
GarageType_Attchd 0.0436 0.028 1.564 0.118 -0.011 0.098
GarageType_Basment 0.0333 0.037 0.901 0.368 -0.039 0.106
GarageType_BuiltIn 0.0358 0.030 1.201 0.230 -0.023 0.094
GarageType_CarPort -0.0040 0.051 -0.078 0.938 -0.105 0.097
GarageType_Detchd 0.0284 0.028 1.026 0.305 -0.026 0.083
GarageType_None -0.0285 0.122 -0.233 0.816 -0.269 0.212
GarageFinish_None 0.1134 0.125 0.909 0.364 -0.131 0.358
GarageFinish_RFn 0.0020 0.007 0.281 0.779 -0.012 0.016
GarageFinish_Unf 0.0065 0.009 0.760 0.448 -0.010 0.023
PavedDrive_P 0.0078 0.019 0.419 0.675 -0.029 0.044
PavedDrive_Y 0.0340 0.012 2.865 0.004 0.011 0.057
SaleCondition_AdjLand 0.0589 0.084 0.699 0.485 -0.106 0.224
SaleCondition_Alloca 0.1287 0.069 1.863 0.063 -0.007 0.264
SaleCondition_Family -0.0854 0.033 -2.592 0.010 -0.150 -0.021
SaleCondition_Normal 0.0580 0.016 3.531 0.000 0.026 0.090
SaleCondition_Partial 0.1109 0.021 5.189 0.000 0.069 0.153
SchD_S_5 0.0336 0.009 3.541 0.000 0.015 0.052
ExtMatl_AsphShn 0.0838 0.119 0.705 0.481 -0.149 0.317
ExtMatl_BrkFace 0.0890 0.030 2.973 0.003 0.030 0.148
ExtMatl_CBlock -0.1347 0.117 -1.155 0.248 -0.363 0.094
ExtMatl_HdBoard 0.0066 0.025 0.268 0.789 -0.042 0.055
ExtMatl_ImStucc 0.0347 0.116 0.300 0.764 -0.192 0.262
ExtMatl_MetalSd 0.0380 0.024 1.574 0.116 -0.009 0.085
ExtMatl_Mixed 0.0280 0.024 1.152 0.249 -0.020 0.076
ExtMatl_Plywood 0.0028 0.026 0.107 0.915 -0.048 0.053
ExtMatl_PreCast 0.4219 0.117 3.599 0.000 0.192 0.652
ExtMatl_Stucco 0.0357 0.033 1.088 0.277 -0.029 0.100
ExtMatl_VinylSd 0.0192 0.024 0.789 0.430 -0.029 0.067
ExtMatl_Wd Sdng 0.0130 0.024 0.541 0.589 -0.034 0.060
Funct_3_ModToSev -0.0550 0.020 -2.753 0.006 -0.094 -0.016
Funct_3_Normal 0.0178 0.012 1.485 0.138 -0.006 0.041
==============================================================================
Omnibus: 490.653 Durbin-Watson: 2.068
Prob(Omnibus): 0.000 Jarque-Bera (JB): 4843.044
Skew: -0.741 Prob(JB): 0.00
Kurtosis: 10.045 Cond. No. 3.73e+04
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.73e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
const 0.000000e+00 Distance 1.317128e-05 OverallQual 1.785328e-83 OverallCond 7.581546e-43 TotRmsAbvGrd 4.128864e-33 Fireplaces 5.174094e-09 GarageArea 9.223537e-33 MasVnrArea2 4.180283e-02 total_LivArea 4.409789e-101 num_bathroom 1.234449e-04 BldgAge 6.952364e-07 LotIsReg 2.074494e-02 HillORDepr 5.706978e-06 BsmtQual_num 2.479304e-07 KitchenQual_num 5.174364e-08 HeatingQC_num 1.056605e-04 TotalPorchSF 1.168088e-02 MSZoning_FV 5.777838e-12 MSZoning_I (all) 4.897592e-02 MSZoning_RH 6.832804e-04 MSZoning_RL 2.834646e-09 MSZoning_RM 2.851459e-05 BldgType_2fmCon 4.820073e-03 BldgType_Duplex 6.061650e-10 BldgType_Twnhs 1.679243e-02 HouseStyle_2Story 5.376374e-04 HouseStyle_SFoyer 4.848411e-02 HouseStyle_SLvl 4.048050e-04 Foundation_PConc 1.182523e-02 BsmtExposure_Gd 1.732428e-04 BsmtExposure_Mn 2.262584e-02 BsmtExposure_No 9.074225e-03 CentralAir_Y 1.274814e-02 PavedDrive_Y 4.209032e-03 SaleCondition_Family 9.598061e-03 SaleCondition_Normal 4.233470e-04 SaleCondition_Partial 2.306671e-07 SchD_S_5 4.075254e-04 ExtMatl_BrkFace 2.983124e-03 ExtMatl_PreCast 3.265755e-04 Funct_3_ModToSev 5.948547e-03 dtype: float64
print(lin_reg.score(X_train, y_train))
print(lin_reg.score(X_test, y_test))
0.9167675316925197 0.8553809707773833
R^2 of Train set: 0.9369999792245037
R^2 Test set: 0.7704724966752604
feature importance
0 b'OverallQua' 0.574323
1 b'total_LivA' 0.278932
2 b'num_bathro' 0.031961
3 b'GarageArea' 0.024335
4 b'MSZoning_R' 0.013634
5 b'Fireplaces' 0.010243
6 b'BldgAge' 0.007780
7 b'OverallCon' 0.006985
8 b'Distance' 0.006923
9 b'BsmtQual_n' 0.006482
10 b'TotalPorch' 0.005796
11 b'HeatingQC_' 0.003707
12 b'SaleCondit' 0.002729
13 b'SaleCondit' 0.002523
14 b'TotRmsAbvG' 0.002211
15 b'MoSold' 0.001882
16 b'Funct_3_No' 0.001808
17 b'HouseStyle' 0.001675
18 b'FireplaceQ' 0.001630
19 b'PavedDrive' 0.001618
CPU times: user 3.56 s, sys: 624 ms, total: 4.18 s
Wall time: 6.97 s
Grid Search Best Parameters: {'criterion': 'mse', 'min_samples_leaf': 8, 'min_samples_split': 28}
Grid Search Best Scores: 0.8268176932421467
Grid Search R2 of Train set: 0.8983554977590436
Grid Search R2 of Test set: 0.797853693953499
<Figure size 1224x720 with 0 Axes>
R^2 of Train set: 0.9832981784794597
R^2 Test set: 0.8211597655778335
feature importance
0 b'total_LivA' 0.128920
1 b'OverallQua' 0.087664
2 b'GarageArea' 0.070480
3 b'BldgAge' 0.065392
4 b'num_bathro' 0.055254
5 b'BsmtQual_n' 0.049037
6 b'KitchenQua' 0.045427
7 b'FireplaceQ' 0.036407
8 b'TotRmsAbvG' 0.034690
9 b'Fireplaces' 0.034439
10 b'TotalPorch' 0.032197
11 b'Foundation' 0.029651
12 b'HeatingQC_' 0.020817
13 b'Distance' 0.020533
14 b'GarageFini' 0.018557
15 b'OverallCon' 0.017835
16 b'GarageType' 0.017016
17 b'MSZoning_R' 0.013469
18 b'MasVnrArea' 0.013318
19 b'GarageType' 0.012310
CPU times: user 29.3 s, sys: 1.7 s, total: 31 s
Wall time: 1min 30s
Grid Search Best Parameters: {'criterion': 'mse', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 100, 'random_state': 42}
Grid Search Best Scores: 0.8792817295203168
Grid Search R2 of Train set: 0.9830955549029718
Grid Search R2 of Test set: 0.821394150770663
R^2 of Train set: 0.17731437608580825
R^2 Test set: 0.15184788172704156
0 1
0 OverallQual 0.383076
1 total_LivArea 0.359307
2 GarageArea 0.054201
3 FireplaceQu_num 0.032072
4 BldgAge 0.030127
5 KitchenQual_num 0.024112
6 OverallCond 0.018578
7 num_bathroom 0.011860
8 MSZoning_RM 0.011799
9 CentralAir_Y 0.011017
10 Fireplaces 0.007108
11 TotRmsAbvGrd 0.006971
12 TotalPorchSF 0.006776
13 MSZoning_RL 0.006041
14 GarageType_Attchd 0.004861
15 BsmtQual_num 0.004315
16 GarageCond_num 0.004168
17 Distance 0.004092
18 HeatingQC_num 0.004044
19 PavedDrive_Y 0.002125
Text(0.5, 1.0, 'Feature Importance Plot of 1000-Tree GBM')
3.158590218274496e-05
| feature | coef | |
|---|---|---|
| 2 | OverallQual | 0.068616 |
| 3 | OverallCond | 0.036761 |
| 4 | TotRmsAbvGrd | 0.028648 |
| 5 | Fireplaces | 0.043930 |
| 6 | GarageArea | 0.000215 |
| 8 | MasVnrArea2 | 0.007217 |
| 9 | total_LivArea | 0.290148 |
| 10 | num_bathroom | 0.018246 |
| 15 | HillORDepr | 0.048925 |
| 16 | PosFeat | 0.016757 |
| 17 | ExterQual_num | 0.009755 |
| 18 | BsmtQual_num | 0.012905 |
| 19 | KitchenQual_num | 0.013091 |
| 20 | FireplaceQu_num | 0.002377 |
| 24 | GarageCond_num | 0.000307 |
| 25 | HeatingQC_num | 0.005984 |
| 26 | TotalPorchSF | 0.002698 |
| 28 | MSZoning_FV | 0.101146 |
| 29 | MSZoning_I (all) | 0.002829 |
| 31 | MSZoning_RL | 0.063494 |
| 37 | HouseStyle_1.5Unf | 0.000464 |
| 38 | HouseStyle_1Story | 0.024071 |
| 45 | Foundation_PConc | 0.016294 |
| 48 | Foundation_Wood | 0.015804 |
| 49 | BsmtExposure_Gd | 0.041441 |
| 52 | CentralAir_Y | 0.045717 |
| 56 | GarageType_Attchd | 0.013846 |
| 66 | PavedDrive_Y | 0.036055 |
| 68 | SaleCondition_Alloca | 0.077351 |
| 70 | SaleCondition_Normal | 0.039659 |
| 71 | SaleCondition_Partial | 0.087334 |
| 72 | SchD_S_5 | 0.026390 |
| 73 | ExtMatl_AsphShn | 0.001317 |
| 74 | ExtMatl_BrkFace | 0.050218 |
| 78 | ExtMatl_MetalSd | 0.013079 |
| 79 | ExtMatl_Mixed | 0.003917 |
| 81 | ExtMatl_PreCast | 0.353994 |
| 86 | Funct_3_Normal | 0.016080 |